NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PSBD: Prediction shift uncertainty unlocks backdoor detection

Li, Wei; Chen, Pin-Yu; Liu, Sijia; Wang, Ren (July 2025, Proceedings of the Computer Vision and Pattern Recognition Conference)

Full Text Available
Revisiting Mode Connectivity in Neural Networks with Bezier Surface

Ren, Jie; Chen, Pin-Yu; Wang, Ren (April 2025, The Thirteenth International Conference on Learning Representations)

Understanding the loss landscapes of neural networks (NNs) is critical for optimizing model performance. Previous research has identified the phenomenon of mode connectivity on curves, where two well-trained NNs can be connected by a continuous path in parameter space where the path maintains nearly constant loss. In this work, we extend the concept of mode connectivity to explore connectivity on surfaces, significantly broadening its applicability and unlocking new opportunities. While initial attempts to connect models via linear surfaces in parameter space were unsuccessful, we propose a novel optimization technique that consistently discovers Bézier surfaces with low-loss and high-accuracy connecting multiple NNs in a nonlinear manner. We further demonstrate that even without optimization, mode connectivity exists in certain cases of Bézier surfaces, where the models are carefully selected and combined linearly. This approach provides a deeper and more comprehensive understanding of the loss landscape and offers a novel way to identify models with enhanced performance for model averaging and output ensembling. We demonstrate the effectiveness of our method on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets using VGG16, ResNet18, and ViT architectures.
more » « less
Full Text Available
Modular Prompt Learning Improves Vision-Language Models

https://doi.org/10.1109/ICASSP49660.2025.10889690

Huang, Zhenhan; Pedapati, Tejaswini; Chen, Pin-Yu; Gao, Jianxi (April 2025, IEEE)

Full Text Available
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Li, Hongkang; Lu, Songtao; Chen, Pin-Yu; Cui, Xiaodong; Wang, Meng (April 2025, The Thirteenth International Conference on Learning Representations (ICLR))

Full Text Available
When is task vector provably effective for model editing? a generalization analysis of nonlinear transformers

Li, Hongkang; Zhang, Yihua; Zhang, Shuai; Wang, Meng; Liu, Sijia; Chen, Pin-Yu (May 2025, 2025 International Conference on Learning Representations (ICLR))

Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors, each of which is the weight update from the pre-trained model to fine-tuned models for certain tasks. This approach recently gained attention as a computationally efficient inference method for model editing, e.g., multi-task learning, forgetting, and out-of-domain generalization capabilities. However, the theoretical understanding of why task vectors can execute various conceptual operations remains limited, due to the highly non-convexity of training Transformer-based models. To the best of our knowledge, this paper provides the first theoretical characterization of the generalization guarantees of task vector methods on nonlinear Transformers. We consider a conceptual learning setting, where each task is a binary classification problem based on a discriminative pattern. We theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or aligned tasks, as well as the success of task negation in unlearning one task from irrelevant or contradictory tasks. Moreover, we prove the proper selection of linear coefficients for task arithmetic to achieve guaranteed generalization to out-of-domain tasks. All of our theoretical results hold for both dense-weight parameters and their low-rank approximations. Although established in a conceptual setting, our theoretical findings were validated on a practical machine unlearning task using the large language model Phi-1.5 (1.3B).
more » « less
Full Text Available
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers

Li, Hongkang; Zhang, Yihua; Zhang, Shuai; Chen, Pin-Yu; Liu, Sijia; Wang, Meng (April 2025, The Thirteenth International Conference on Learning Representations (ICLR))

Full Text Available
CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian Sampling

https://doi.org/10.14722/ndss.2025.230915

Zhang, Kaiyuan; Cheng, Siyuan; Shen, Guangyu; Ribeiro, Bruno; An, Shengwei; Chen, Pin-Yu; Zhang, Xiangyu; Li, Ninghui (February 2025, Internet Society)

Federated learning collaboratively trains a neural network on a global server, where each local client receives the current global model weights and sends back parameter updates (gradients) based on its local private data. The process of sending these model updates may leak client’s private data information. Existing gradient inversion attacks can exploit this vulnerability to recover private training instances from a client’s gradient vectors. Recently, researchers have proposed advanced gradient inversion techniques that existing defenses struggle to handle effectively. In this work, we present a novel defense tailored for large neural network models. Our defense capitalizes on the high dimensionality of the model parameters to perturb gradients within a subspace orthogonal to the original gradient. By leveraging cold posteriors over orthogonal subspaces, our defense implements a refined gradient update mechanism. This enables the selection of an optimal gradient that not only safeguards against gradient inversion attacks but also maintains model utility. We conduct comprehensive experiments across three different datasets and evaluate our defense against various state-of-the-art attacks and defenses. Code is available at https://censor-gradient.github.io.
more » « less
Full Text Available
Network properties determine neural network performance

https://doi.org/10.1038/s41467-024-48069-8

Jiang, Chunheng; Huang, Zhenhan; Pedapati, Tejaswini; Chen, Pin-Yu; Sun, Yizhou; Gao, Jianxi (December 2024, Nature Communications)

Abstract Machine learning influences numerous aspects of modern society, empowers new technologies, from Alphago to ChatGPT, and increasingly materializes in consumer products such as smartphones and self-driving cars. Despite the vital role and broad applications of artificial neural networks, we lack systematic approaches, such as network science, to understand their underlying mechanism. The difficulty is rooted in many possible model configurations, each with different hyper-parameters and weighted architectures determined by noisy data. We bridge the gap by developing a mathematical framework that maps the neural network’s performance to the network characters of the line graph governed by the edge dynamics of stochastic gradient descent differential equations. This framework enables us to derive a neural capacitance metric to universally capture a model’s generalization capability on a downstream task and predict model performance using only early training results. The numerical results on 17 pre-trained ImageNet models across five benchmark datasets and one NAS benchmark indicate that our neural capacitance metric is a powerful indicator for model selection based only on early training results and is more efficient than state-of-the-art methods.
more » « less
Full Text Available
SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing

https://doi.org/10.1145/3637528.3671586

Yin, Changchang; Chen, Pin-Yu; Yao, Bingsheng; Wang, Dakuo; Caterino, Jeffrey; Zhang, Ping (August 2024, ACM)

Full Text Available
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Qi, Xiangyu; Zeng, Yi; Xie, Tinghao; Chen, Pin-Yu; Jia, Ruoxi; Mittal, Prateek; Henderson, Peter (May 2024, ICLR)

Full Text Available

« Prev Next »

Search for: All records